Goto

Collaborating Authors

 loaded dice



Loaded DiCE: Trading off Bias and Variance in Any-Order Score Function Gradient Estimators for Reinforcement Learning

Neural Information Processing Systems

Gradient-based methods for optimisation of objectives in stochastic settings with unknown or intractable dynamics require estimators of derivatives. We derive an objective that, under automatic differentiation, produces low-variance unbiased estimators of derivatives at any order. Our objective is compatible with arbitrary advantage estimators, which allows the control of the bias and variance of any-order derivatives when using function approximation. Furthermore, we propose a method to trade off bias and variance of higher order derivatives by discounting the impact of more distant causal dependencies. We demonstrate the correctness and utility of our estimator in analytically tractable MDPs and in meta-reinforcement-learning for continuous control.



Reviews: Loaded DiCE: Trading off Bias and Variance in Any-Order Score Function Gradient Estimators for Reinforcement Learning

Neural Information Processing Systems

The DiCE gradient estimator [1] allows the computation of higher-order derivatives in stochastic computation graphs. This may be useful in contexts such multi-agent learning or meta-RL where the proper application of methods such as MAML require the computation of second-order derivatives. The current paper extends DiCE and derives a more general objective that allows integration of the advantage A(s_t, a_t) Q(s_t, a_t) - V(s_t) in order to control for the variance while providing unbiased estimates. The advantage can be approximated by trading off variance for bias using parametric function approximators and methods such as Generalized Advantage Estimation (GAE). Moreover, the authors propose to further control the variance of the higher-order gradients by discounting the impact past actions on the current advantage, thus limiting the range of causal dependencies. This paper is well executed: it is well written, technically sound and potentially impactful.


Reviews: Loaded DiCE: Trading off Bias and Variance in Any-Order Score Function Gradient Estimators for Reinforcement Learning

Neural Information Processing Systems

This paper presents novel methodology in combination with automatic differentiation, that yields unbiased and low-variance estimators of derivatives at any order. It appears potentially to be widely useful, and the exposition is clear to understand. The reviewers and I seem to be in general agreement in liking the paper. Reviewer 1 wrote a thorough review touching on many aspects of the paper. The overall score was 7, and his bottom line positives were: "This paper is well executed: it is well written, technically sound and potentially impactful."


Loaded DiCE: Trading off Bias and Variance in Any-Order Score Function Gradient Estimators for Reinforcement Learning

Neural Information Processing Systems

Gradient-based methods for optimisation of objectives in stochastic settings with unknown or intractable dynamics require estimators of derivatives. We derive an objective that, under automatic differentiation, produces low-variance unbiased estimators of derivatives at any order. Our objective is compatible with arbitrary advantage estimators, which allows the control of the bias and variance of any-order derivatives when using function approximation. Furthermore, we propose a method to trade off bias and variance of higher order derivatives by discounting the impact of more distant causal dependencies. We demonstrate the correctness and utility of our estimator in analytically tractable MDPs and in meta-reinforcement-learning for continuous control.


Loaded DiCE: Trading off Bias and Variance in Any-Order Score Function Gradient Estimators for Reinforcement Learning

Farquhar, Gregory, Whiteson, Shimon, Foerster, Jakob

Neural Information Processing Systems

Gradient-based methods for optimisation of objectives in stochastic settings with unknown or intractable dynamics require estimators of derivatives. We derive an objective that, under automatic differentiation, produces low-variance unbiased estimators of derivatives at any order. Our objective is compatible with arbitrary advantage estimators, which allows the control of the bias and variance of any-order derivatives when using function approximation. Furthermore, we propose a method to trade off bias and variance of higher order derivatives by discounting the impact of more distant causal dependencies. We demonstrate the correctness and utility of our estimator in analytically tractable MDPs and in meta-reinforcement-learning for continuous control.


Loaded DiCE: Trading off Bias and Variance in Any-Order Score Function Estimators for Reinforcement Learning

Farquhar, Gregory, Whiteson, Shimon, Foerster, Jakob

arXiv.org Machine Learning

Gradient-based methods for optimisation of objectives in stochastic settings with unknown or intractable dynamics require estimators of derivatives. We derive an objective that, under automatic differentiation, produces low-variance unbiased estimators of derivatives at any order. Our objective is compatible with arbitrary advantage estimators, which allows the control of the bias and variance of any-order derivatives when using function approximation. Furthermore, we propose a method to trade off bias and variance of higher order derivatives by discounting the impact of more distant causal dependencies. We demonstrate the correctness and utility of our objective in analytically tractable MDPs and in meta-reinforcement-learning for continuous control.